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A Poor PERSON’S POSTERIOR PREDICTIVE CHECKING OF STRUCTURAL EQUATION MODELS 
Abstract 


Posterior Predictive Model Checking (PPMC) is a Bayesian model checking method that 
compares the observed data to (plausible) future observations from the posterior pre- 
dictive distribution. We propose an alternative to PPMC in the context of structural 
equation modeling, which we term the Poor Persons PPMC (PP-PPMC), for the situa- 
tion wherein one cannot afford (or is unwilling) to draw samples from the full posterior. 
Using only byproducts of likelihood-based estimation (maximum likelihood estimate 
and information matrix), the PP-PPMC offers a natural method to handle parameter 
uncertainty in model fit assessment. In particular, a coupling relationship between the 
classical p-values from the model fit chi-square test and the predictive p-values from the 
PP-PPMC method is carefully examined, suggesting that PP-PPMC may offer an alter- 
native, principled approach for model fit assessment. We also illustrate the flexibility of 
the PP-PPMC approach by applying it to case-influence diagnostics. 
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1 Introduction 

The posterior predictive model checking (PPMC) method is a Bayesian model di- 
agnostic tool for assessing the compatibility of a posited model to observed data by 
comparing the observed data to plausible/future observations simulated from the poste- 
rior predictive distribution. This method is predicated upon the idea that, if the model fits 
the data reasonably well, the future/plausible observations should be “similar” to the 
observed data, whereas large discrepancies should be taken as an indication of model 
misspecification. In applications, “similarity” or ’discrepancy” is to be defined (by the 
researcher) in such a way that differences between a critical aspect of the observed data 
and that of the model-implied future observations can be properly measured. 

The use of model-implied future observations in model-data fit evaluations was first 
introduced by Guttman (1967), and its formal definition was given by Rubin (1981, 1984), 
further elaborated by Meng (1994), Gelman, Meng, and Stern (1996). A good didactic 
discussion of the PPMC method in general can be found in Gelman, Carlin, Stern, and 
Rubin (2003, chap. 6). 

In social and behavioral science research, most of the applications of the PPMC 
method can be found in the context of the item response theory (IRT) models in as- 
sessing item fit, person fit, and dimensionality (Hoijtink & Molenaar, 1997; Janssen, 
Tuerlinckx, Meulders, & de Boeck, 2000; Glas & Meijer, 2003; Sinharay, Johnson, & Stern, 
2006; Levy, Mislevy, & Sinharay, 2009; Levy & Svetina, 2011). 

In factor analysis and structural equation modeling (SEM), despite the rapid growth 
of Bayesian approaches (Arminger & Muthén, 1998; Lee & Zhu, 2000; Ansari, Jedidi, & 
Dube, 2002; Lee, 2007; Lee, Song, & Tang, 2007; Palomo, Dunson, & Bollen, 2007), most of 
previous studies focused on Bayesian model building and parameter estimation, paying 
relatively scant attention to the issue of model fit appraisal and diagnostics. Scheines, 
Hoijtink, and Boomsma (1999), Levy (2011), and B. Muthén and Asparouhov (2012) are 


some important exceptions, illustrating key features of Bayesian approaches to model 
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diagnostics and assessing data-model fit of structural equation models. 

In applications of factor analysis and structural equation modeling, standard esti- 
mation and model checking methods are based on the method of maximum likelihood 
(ML). SEM software programs routinely print in the output the ML point estimates for 
the parameters and the associated standard error estimates. The ML point estimates then 
replaces the unknown model parameters to yield model-implied mean vectors and co- 
variance matrices from which chi-square model fit test statistics and various practical fit 
indices such as RMSEA (Root Mean Square Error of Approximation), TLI (Tucker-Lewis 
Index), or SRMR (Standardized Root Mean Square Residual) can be computed. 

In principle, the reference distributions of these test statistics and fit indexes can be 
developed using asymptotic arguments (J6reskog, 1993; Ogasawara, 2001) or with the 
bootstrap (Bollen & Stine, 1993; Efron & Tibshirani, 1994). In the practice of structural 
equation modeling, however, the asymptotic arguments may not be tenable, despite their 
general applicability. For instance, the form of a number of popular model fit indices 
may not be particularly amenable to asymptotic derivations (perhaps with the exception 
of RMSEA). In addition, the development of asymptotic distribution theory for any new 
researcher-defined fit index would require advanced statistical prowess. It is unfair to 
require an average user of structural equation modeling to perform such derivations. On 
the other hand, while the bootstrap method may sidestep many of the inherent issues 
with large-sample arguments, it remains computationally demanding because the model 
under investigation must be fitted to each of the (often hundreds of) bootstrap resamples. 

Even setting practicality aside, classical likelihood-based approaches evaluate the dis- 
crepancy between the observed data and the hypothesized model when the unknown 
model parameters are replaced by the best-fitting point estimates. The parameter esti- 
mation uncertainty, although quantifiable in the form of asymptotic covariance matrix 
(inverse of Fisher information matrix) of the ML parameter estimates, is not accounted 


for in classical likelihood-based model fit assessment. This is made explicit in the appli- 
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cation of the so-called parametric bootstrap method to model fit testing. In parametric 
bootstrap resampling, the resamples are generated from the hypothesized model with all 
parameters replaced by the sample ML estimate (see, e.g., Tollenaar & Mooijaart, 2003). 
The construction of the bootstrapped reference distribution of a fit statistic requires fit- 
ting the hypothesized model to each bootstrap resample. 

By contrast, the PPMC method is a simulation-based model checking method, requir- 
ing neither asymptotic arguments (cf. likelihood-ratio chi-square statistic) nor computa- 
tionally intensive model re-fitting (cf. bootstrap). When the plausible values of the data 
can be drawn from the posterior predictive distribution, constructing reference distribu- 
tions of any test quantity defined by the investigator involves minimal cost. Moreover, 
use of the posterior predictive distribution implies that one must take into account the 
entire posterior distribution of the model’s parameters, rather than the best-fitting point 
estimates only. As such, the PPMC method naturally integrates parameter uncertainty 
into model fit assessment (Gelman et al., 2003; Rupp, Dey, & Zumbo, 2004; Levy, 2011). 

To take full advantage of the PPMC method, however, the researcher must be able to 
simulate draws from the full posterior distribution of the model parameters. If param- 
eter estimation is accomplished with Bayesian sampling based methods, e.g., Markov 
chain Monte Carlo (MCMC; see e.g., Gilks, Richardson, & Spiegelhalter, 1996), samples 
from the posterior predictive distribution can be obtained as a byproduct of the MCMC 
output. On the other hand, for those who chose not to adopt Bayesian methods, are 
unfamiliar with Bayesian methods, or when Bayesian methods are cumbersome, com- 
plexities remain. For example, the choice of prior distribution on variance components 
can be complicated due to its potentially large effect on subsequent inference (Gelman, 
2006). Convergence monitoring of MCMC often requires considerable expertise on the 
part of the user, and in our view should not be fully automated (see discussions in 
Cowles & Carlin, 1996 and MacCallum, Edwards, & Cai, 2012). 


For high-dimensional highly-parameterized latent variable models, numerous au- 
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thors advocated that one of the critical first steps toward a sensible full Bayesian analysis 
in fact lies in the use of a mode-finding method (e.g., ML) for parameter estimation (see 
e.g., Gelman et al., 2003). Recognizing both the enormous advantages as well as the 
potential complications of the Bayesian PPMC methods, and at the same time, given the 
current dominance of likelihood-based methods for parameter estimation and model fit 
testing (due to both history and practicality), we pose a question that provides the guid- 
ing motivation for this research: Can one find a predictive model checking method for 
evaluating models fitted using the method of maximum likelihood? 

In response to this question, we propose a Poor Person’s PPMC method (PP-PPMC), 
which employs byproducts of maximum likelihood estimation, i.e., the ML parameter 
estimates and the associated asymptotic error covariance matrix. This method is termed 
a Poor-Person’s PPMC because we believe that it may provide a computationally effi- 
cient non-iterative (cf. bootstrap) mechanism to conduct predictive model checking that 
directly builds parameter uncertainty into consideration (cf. standard likelihood ratio 
test) for the researcher who, for various reasons, cannot “afford to” or is not (yet) willing 
to draw samples from the full posterior distribution of the model’s parameters. 

The remainder of this paper is organized as follows. First, we introduce the original 
PPMC method to provide the necessary background for discussing the proposed PP- 
PPMC method. Because the PP-PPMC method employs only byproducts of maximum 
likelihood estimation, we closely examine the relationship between the classical p-value 
under the likelihood ratio chi-square test and the p-values under the PP-PPMC method. 
In subsequent sections, we apply the PP-PPMC method to two specific cases: overall 
goodness-of-fit assessment, and case-influence diagnostics. Using empirical and simu- 
lated data, we show that the proposed method can be an effective tool for both overall 
model fit testing and case-influence diagnostics. We conclude the paper with discussions 


about practical implications of the proposed method and future research. 
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2 Posterior Predictive Model Checking 

Let € be some sample space. Throughout the paper we use the notation Y°"S € E to 
represent observed data, H the hypothesized model, and @ the d-dimensional vector of 
unknown model parameters that resides in the parameter space @ which is a subset of 
the d-dimensional Cartesian product of real numbers 6 € © C IR’. We use the notation 
yy”? € € to denote hypothetical replicated data or plausible future observations under 
the hypothesized model H. 
2.1 Classical Estimation and Inference for SEM 

In this paper we consider a general mean and covariance structure model. Let there 
be p observed/manifest variables in the model. The p x 1 vector of manifest variables y 
serve as indicators for a vector of q latent variables y via a factor analytic measurement 
model y = t + Ay + €, where the unique factors in e have zero means and covariance 
matrix ¥. The relationships among the latent variables are described by simultaneous 
equations 4 = « + By + ¢, where the disturbance terms in ¢ have zero means and covari- 
ance matrix ®. Both ¥ and ® are functions of the parameter vector 6. In addition, the 
measurement intercepts tT, the factor loadings A, the latent regression intercepts « and 
the regression coefficients B are also functions of 6. Assuming orthogonality of ¢ and e, 


the following mean and covariance structure model for random vector y can be derived: 


E(y) = w(0)= T+ AB,'a, (1) 


cov(y) = 2(0) = AB,'®(B,!)'A’+¥, (2) 


where B,. = I, — B is assumed to be non-singular, and I, is a q x q identity matrix. 

Let px = p+ p(p+1)/2 denote the number of observed first and second moments. It 
is often convenient to consider the p, x 1 vector of unique model-implied means and co- 
variances 0 (0) = [u(0)', vech(Z(@))']’, where vech(Z) is the half-vectorization operator 


that returns the a vector consisting of the p(p + 1)/2 unique elements of XZ. 
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Specification of a structural equation model H includes the pattern and values of free 
and fixed elements of the parameter matrices as well as additional restrictions on @. In 
the classical model fitting context (see, e.g., Cudeck & Henly, 1991), it is often assumed 
that there exists a true population mean vector fy and a true population covariance 
matrix Lg, and the model is referred to as correctly specified if and only if there exists 
some 09 € @ such that 7(89) = oo = [yg,vech(Zo)’]’.. We assume that the model is 
locally identified such that the p. x d Jacobian matrix 


_ 90 (8) 


A(6) = aa (3) 


has full column rank, at least in a neighborhood of 6p. 

Assuming correct model specification, the model fitting task is reduced to that of 
parameter estimation. Given a random sample Y°"s of size N, the sample mean vec- 
obs 


tor can be written as y = N71 yy 


obs where y?"5 denotes the ith sample observa- 


tion, and the sample covariance matrix (maximum likelihood estimate)! is denoted 
as S = N71yN_,(y9!5 — ¥)(y9’ —¥)'. The p« x 1 sample counterpart to 7(@) is s = 
[y’, vech(S)']’. 

Parameter estimation by the method of normal theory maximum likelihood requires 
the assumption of multivariate normality of the observed variables. After some algebra, 


the log-likelihood function can be written as 


(AY) = C(0|s) « —> {log |2(0)| + te[Z(0)~"S] + fy — (0)]'2(0)-"'y — (0)]} 


Under multivariate normality, the observed first and second moments in s are sufficient 
statistics under model H. Maximization of £(6/Y°"’) with respect to 6 results in the ML 


estimate 6 € 0. Equivalently, one may also choose to minimize the following maximum 


For simplicity we do not use the unbiased sample covariance matrix estimate with (N — 1) as the 
divisor. All discussions assume large N so any difference will be negligible. 
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Wishart likelihood (MWL) fit function: 


Tuwi(¥°"*, 8) = [y — 4(8)]'E(8)*[y — #(8)] 


+ log |Z(@)| — log |S| + tr[Z(@)~!S] — p. (4) 
Let Z(0) be the (observed) information matrix. It is equal to a half times the second 


derivative matrix of the MWL fit function. 


1 0? Twi (¥°"S, 0) 
TNO) 2 0600’ 


By reparameterization, we see that 


07¢(8|¥°%) —  da(8) 07£(8|Y°") da (8) 


NO 0000 ~=~=CSst*«‘“‘ —St«=«C 


=N{A(0)'T(o)A@)}, ©) 


where the ['(a) is the information matrix of the (unstructured) mean and covariance 


parameters in a based on a multivariate normal distribution, 


I(c) = j (6) 
2-'Di, (£1 @x)D, 


and D, is a p? x p(p +1)/2 duplication matrix (see Schott, 2005). The asymptotic dis- 


tribution of /N(s — 7) is normal (see Browne, 1984), 
VN(s — 00) © Np, (0,T 9°), (7) 


under multivariate normality of the observed variables, where the symbol ~ stands for 


“asymptotically (in N) distributed as,” and Tp = I'(a9). 
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The asymptotic distribution of the ML estimator 6 is given by 
VN(6— 60) A N,j(0, Vo), (8) 


(see Yuan & Bentler, 2007), where the right hand denotes a multivariate normal distri- 
bution with zero means and asymptotic covariance matrix equal to the inverse of the 


information matrix Vo = Z(@9)~!. A consistent finite-sample variance estimate is 
¥ = {NZ(6)} = {NA(6)'Fa4)}", (9) 


where f = I(s). In other words the ML estimates are asymptotically normally dis- 
tributed with large-sample error covariance matrix V. 
The principle of Generalized Least Squares (GLS) leads to another widely used fit 


function for parameter estimation and statistical inference: 
1 
Tars(¥", 8) = 5[s — o(8)]/T(s)[s — o(9)], (10) 


Under normality, the GLS and the MWL fit functions lead to asymptotically equivalent 
solutions (Browne, 1974). It is also well known (see Browne & Arminger, 1995) that 
under the multivariate normal sampling model for y, N times the minimized fit function 
value (NTyw (Y's, 6) or NTcrs(Y°"’, 6)) is distributed as a central chi-square variable 
with p, — d degrees-of-freedom under the null hypothesis of correct model specification 
when N tends to infinity. 
2.2 Basic Setup for PPMC 

As mentioned earlier, the essence of PPMC is to compare the observed data Y°"S with 
the replicated data Y'°?, as opposed to focusing on large sample tests. The hypothetical 
replicated data can be simulated from the posterior predictive distribution, namely the 


conditional distribution of the replicated data Y’? given the observed data Y°’S and 
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the posited model H, denoted as p(Y’?|Y°"’,H). The general form of the posterior 


predictive distribution is given as 
p(Y?|¥°%, H) =f p(v'"|a, H)p(olv"™, H)d0 (11) 


Equation (11) shows that the integral defining the posterior predictive distribution con- 
sists of two components. One is p(Y'’?|6, H), or the sampling distribution of the repli- 
cated data Y"’? given particular values of the model parameters 6 under model H. The 
other is p(6|Y°’’, H) or the posterior distribution of model parameters 6 under model 
H given observed data Y°’’. We note that p(6|Y°"’, H) quantifies the plausible values of 
0 after the data have been observed. Therefore, Equation (11) shows explicitly how the 
use of the posterior predictive distribution addresses the problem of unknown model 
parameters in making probabilistic statements about the replicated data; it integrates 
out (averages over) the unknown @ in the sampling distribution of Y'°? over its poste- 
rior distribution p(6/Y°"’, H). As such, the entire posterior distribution of the model’s 
parameters is integrated into the model fit checking procedures. 

The inferential principle of PPMC is similar to that of classical model fit testing, i.e. 
to locate the position of observed data Y?"’ in a reference distribution. If the model fits 
the data well, the observed data YS will not stand out in the reference distribution. 
A key difference between PPMC and classical fit testing lies in choice of the reference 
distribution. Under PPMC, the posterior predictive distribution given in (11) is used 
as the reference distribution, whereas, under classical testing, the sampling distribution 
p(Y|6,H), with @ replaced by the ML estimate 6, provides the basis of reference distri- 
butions for model fit hypothesis tests. 

2.3 Test Quantities and p-values 
In order to measure the degrees and manners in which the observed Y°"s and repli- 


cated data Y’’? are discrepant, appropriate test quantities should be defined (Gelman et 
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al., 1996). A test quantity T(Y,@), or measure of discrepancy, is a function of both the 
data and the parameters. For example, if the overall fit of the hypothesized structural 
equation model is to be evaluated, the test quantity may be the MWL fit function de- 
fined in Equation (4) or the GLS fit function in Equation (10), but treating both data and 
parameters as arguments. 

In classical overall goodness-of-fit testing, the null hypothesis states that the assumed 
model holds exactly in the population, ie., Ho : 79 = (009), where 7 = [pp, vech(Zp)’]’. 
Note that the composite null hypothesis contains unknown parameters. In the classical 
approach, the unknown parameters are replaced by the ML point estimates 6. The classi- 
cal p-values are defined as Pi =a > NTwwr(Ys, 6)) or PEG <3 > NTgrs(¥"s, 6)), 
where the probability is taken over a central chi-square distribution with p. — d degrees 
of freedom. Under the null hypothesis, the only source of random variation comes in 
the form of multivariate normal sampling of the observed data. 

Under the PPMC framework, potential test quantities in Equations (4) and (10) are 
treated as functions of both data Y and parameters 0. In other words, 0 in Tywt(Y, 0) 
is no longer fixed at the ML estimate 6, and Y is no longer only taken to mean the 
observed data Y?’®. When the distribution of the test quantity is based on the joint 
posterior distribution of the replicated data Y’’? and 8, the distribution of T(Y'’?, 0) is 
usually called predictive test quantities in the Bayesian literature. 

As a direct result of using the parameter posterior p(0|Y°"’, H) to quantify uncer- 
tainty in 0, T(Y°"S, @), or in other words, the test quantity with data fixed at the observed 
values but 6 randomly sampled from its posterior, is often referred to as realized test 
quantity in the Bayesian literature. Note that variation in T(Y?’’,@) comes from poste- 
rior variability of the parameters. 

Under the PPMC framework, the distribution of the predictive test quantity plays the 
role of the reference distribution. The distribution of the realized test quantity plays the 


role of the observed test statistics. Thus test quantities can be considered as generaliza- 
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tions of classical test statistics. 


Bayesian posterior predictive p-values for the test quantities can be defined as 


Bayesian p-value = Pr{T(Y"?, 0) > T(Y°"S, 6) |Y°"s, H} 


= | | 1{T(v"?, 0) > T(v, 0) } p(v?|9, H)p(alv, H)dedy?, (12) 
E JO 


where 1{T(Y’?,6) > T(Y°"S,@)} is an indicator function that takes on a value of 1 if 
and only if the event {T(¥’?,@) > T(Y°"S, @)} is true. That is, the Bayesian p-value for a 
given test quantity TY, 0) is defined as the probability that the replicated data, Y’’?, are 
more extreme than the observed data, Y°’’, as measured by the test quantity, where the 
probability is taken over the joint posterior distribution of Y’*? and 8. 

Bayesian p-values may be employed in the same manner as classical p-values, for 
rejecting the null hypothesis if the value is less than the nominal significance level w. 
From this hypothesis-testing perspective, however, the Type I error rates of the Bayesian 
p-value are known to be below the nominal a level resulting in conservative inferences. 
Theoretical properties of the Bayesian p-values are thoroughly examined in Robins, van 
der Vaart, and Ventura (2000) and Dahl (2006). 

Another perspective regarding the use of the Bayesian p-values, however, is that the 
Bayesian p-value provides a simple numerical summary of the degree of discrepancy 
between the reality and the model, rather than a rigid accept-rejection decision rule. 
This perspective is based on a widely accepted fundamental principle that all statistical 
models are wrong to some degree (Box, 1979), and thus the more pertinent issue is 
to characterize the ways in which the assumed model is wrong and what aspects of 
the model is useful for description, prediction, and synthesis (Cudeck & Henly, 1991, 
p- 512). From this perspective, graphical model checking has been advocated (Stern, 
2000; Gelman et al., 1996; Gelman, 2003, 2004, 2007), with Bayesian p-value serving as a 


numerical summary of the model-data discrepancy. 


A Poor Person’s PPMC 13 


2.4 Implementation 

As shown in Equations (11) and (12), implementing the PPMC method involves mul- 
tidimensional integration, which in most circumstances is analytically intractable. To 
circumvent analytical derivations of the posterior predictive distribution, Monte Carlo 
simulation methods are employed in practice. Specifically, a composition method is 
used. Note that by Equation (11), the joint posterior of Y’*? and @ can be obtained as a 
product: p(Y”?, 0/Y°"’, H) = p(Y'’?|0, H)p(0|Y™s, H). 


First, L sets of plausible parameters 6',..., 0" 


are sampled from the posterior distri- 
bution of the parameters p(6/Y’’, H). For each plausible parameter vector 6°, we draw 
one hypothetical replicated data set Y"’?’ from the sampling distribution p(Y|6‘, H). We 


then have L pairs of draws from the joint posterior distribution of Y’? and 9, i.e., 
(YO) BOP OV Hy LT sae (13) 


With sufficiently large number of draws, arbitrarily close approximations to functionals 
of p(Y’’?, 0/Y°"S, H) can be constructed with sample averages. 

For example, based on these draws, we can compare the distribution of the realized 
test quantities, T(/Y°’, 0°) against that of the predictive test quantities T(y”’?’, 0°). Cre- 
ating a scatterplot of the predictive test quantities (on the Y-axis) against the realized test 
quantities (on the X-axis) provides a convenient visual assessment of model adequacy. 
For correctly specified models, the points are expected to be evenly divided along the 
45-degree reference line. 

The Bayesian p-values defined in Equation (12) can be approximated by counting the 
proportion of draws for which the predictive test quantity exceeds its corresponding 


realized test quantity. That is, 


L 
Bayesian p-value ~ — ) 1 sree, ey ST, of) ; (14) 
l=1 


mle 
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This is equivalent to counting the proportion of points above the 45-degree reference line 
in a scatterplot of the two test quantities defined above. 
3 Poor-Person’s Posterior Predictive Model Checking 

Considering the role of the posterior distribution p(6|Y°’, H) in the construction of 
the posterior predictive distribution (see Equation 11), it is not too difficult to see that 
other distributions of the parameters could have been utilized (Gelfand, 1996, p.149). In 
fact, Box (1980) suggested the use of the prior distribution, whereas Bayarri and Berger 
(2000) proposed the use of the partial or conditional posterior distribution. The predictive 
model checking procedures based on these alternatives are now known as the prior pre- 
dictive model checking and partial (or conditional) posterior predictive model checking 
method, respectively. It has also been suggested that the cross-validation posterior distri- 
bution, which is the posterior distribution conditional on a subset of the observed data, 
can be used for predictive model checking purposes (Geisser, 1975; Gelfand, Dey, & 
Chang, 1992; Pena & Tiao, 1992). Notice that all of these distributions can be combined 


into a general expression 
p(v"?|A,H) = [ p(x"|9, H)p(0|A, Hd. (15) 


For example, with A being an empty set, the general expression in (15) becomes the 
prior predictive distribution, whereas with A being the observed data Y°"’, the general 
expression in (15) becomes the posterior predictive distribution in (11). With A equal 
to the set denoted by {Y°/T(Y°"S)}, where T(Y°’’) represents a test statistic summa- 
rizing a characteristic of Y°’ and / is used to indicate “partialing out” T(Y°"S) from 
Y's, the general expression in (15) becomes the partial posterior predictive distribution. 
Levy (2011) and others have noted that posterior predictive, prior predictive, and partial 
predictive model checking can be viewed as structurally similar. We wish to follow this 


line of reasoning and suggest an alternative that may provide a convenient framework 
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for predictive model checking. 

In this paper, we propose the use of a multivariate normal distribution with its mean 
vector equal to the ML estimate 6 and dispersion matrix equal to the asymptotic covari- 
ance matrix of the the ML estimate V (see Equation 9). That is, we propose that the 


distribution p(6|.A, H) in (15) be replaced by the following normal distribution, 


ga(0) = |207V|~1/? exp {30-60 1(-8)}. (16) 

This proposal is based on a well known result in the Bayesian literature that asymp- 
totically the contribution of the likelihood tends to dominate in the posterior. As a 
consequence, the shape of the posterior, to a first approximation, converges to a multi- 
variate normal with its mean centered around the ML estimate and dispersion matrix 
equal to the inverse of the information matrix (Gelman, 2003). 
3.1 Some Theoretical Basis 

The asymptotic posterior normality can be heuristically shown by expanding the log- 
likelihood ¢(6|Y) for fixed Y around 6 in a multivariate Taylor series, to second order: 


0°0(6|Y) 
0000" 


An 


6). (17) 


£(0|Y) = €(6|Y) 4 5(0 ay | (0 


Notice that the gradient vector of the log-likelihood vanishes in this expansion because 
6 is a stationary point. Given Equation (17), and using the fact that the posterior is 


proportional to the prior p(@) and the likelihood is exp{¢(6|Y) }, it can be shown that 


p(8|Y) « p(@) exp{2(alY) } 
1 aa 


~ p(6) exp{e(6|¥) + 5(8- 6)' | | (0 6)} 


— {-3(0 — 6)' [NZ(8)] (0 — ay 29h {-70 avo — ay} (18) 
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Recall that V = {NZ(8)} (Equation 9). On the final line, p(@) exp{¢(6|y)} can be 
absorbed into the proportionality constant because ((6|y) is a constant that does not 
involve 6 and p(@) is assumed to be diffuse (in comparison to the likelihood). Notice 
that the final expression in Equation (18) represents the kernel of the multivariate normal 
distribution with mean equal to 6 and the dispersion matrix equal to V~!, same as 
Equation (16). More rigorous justifications of posterior normality have been given by 
many authors, including Le Cam (1953) and Le Cam and Yang (2000). 

The asymptotic normality of the posterior implies that the method of maximum like- 
lihood can be regarded as a large-sample Bayesian procedure. This of course requires a 
conceptual shift from interpreting the parameters as fixed quantities to be uncovered by 
an estimator such as maximum likelihood to random variables that may have posterior 
variance conditional on the data and the model. From the multivariate normal ap- 
proximation to the posterior distribution of 0, the posterior predictive distribution, the 
posterior distributions of test quantities, and the Bayesian p-values can be constructed. 
Because we sample from a crude first-order approximation to the posterior, we term the 
procedure a Poor-Person’s posterior predictive model checking (PP-PPMC). 

3.2 Implementation 

To implement PP-PPMC, we first fit the hypothesized model H to the observed data 
Y's by minimizing the MWL fit function, which yields the ML estimates of the param- 
eters 6 and the associated asymptotic covariance matrix V. Second, L sets of plausible 
parameter vectors fe), (2) £4 /OL)y are simulated from the multivariate normal dis- 
tribution with mean and covariance matrix equal to 6 and V, respectively. Then for each 
6, a replicated data set Y"?( is drawn from the sampling distribution under the hy- 
pothesized model H, ie., p(Y|0, H). We now have L pairs of Y’? and @ samples, i.e., 
{ (vr), a) cyrer-2), g)).. (yrer(L), glt)) \. Using these L pairs of Y’? and 0 sam- 
ples, we can compute L pairs of predictive and realized values for a given test quantity 


T(Y,0), i.e. 2T(YPM, A) Tivos O()\ with 0 =1,...,L. Using these predictive and 
g Pp 
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realized test quantities, we can create scatterplots and approximate Bayesian p-values 
just as outlined for the original PPMC method. We term these predictive p-values PP- 


PPMC predictive p-values: 


L 
PP-PPMC predictive p-value ~ . yl {rer a) > T(Y™s, af) . (19) 
f=1 


Aside from requiring the ability to simulate multivariate normal deviates with given 
mean vector and covariance matrix, it is clear that the PP-PPMC procedure only requires 
byproducts of the likelihood-based estimation, i.e. 6 and V, which are routinely available 
in output from standard structural equation modeling software programs. The PP-PPMC 
method can offer a computationally efficient alternative to the original PPMC method, 
with no need for drawing samples from the full posterior (likely with MCMC). The PP- 
PPMC can also be an effective alternative to the classical likelihood-ratio chi-square test, 
offering a way to explicitly account for parameter estimation uncertainty. 

4 A Coupling Effect 

In the PP-PPMC approach, the use of the normal posterior approximation leads to 
an interesting coupling relation of the classical p-values and the PP-PPMC predictive 
p-values. The main result can be stated as follows. With the test quantity chosen to be 
an omnibus overall goodness-of-fit test statistic such as the MWL and GLS fit functions 


in (4) and (10), the PP-PPMC predictive p-value can be further approximated as 


Approximate PP-PPMC predictive p-value = Pr{T(Y”?, 0) > T(YS, @)| Ys, H} 


& Pr{x, —Xq > NT(Y", 6)}, (20) 


where T(Y°’,6) is the minimized value of the fit function based on observed data 
and Ge. and x? represent two independent chi-square random variables with p, = 
p+p(p+1)/2 and d degrees-of-freedom, respectively. A derivation of this result is 
in the Appendix. 


A Poor Person’s PPMC 18 


Recall that the classical p-values can be obtained by 
classical p-value = PEs <% > NT(Y°, 6)}. (21) 


The result in Equation (20) essentially suggests that, when the classical MWL or GLS 
fit functions are used as test quantities, the PP-PPMC predictive p-values can be further 
approximated non-iteratively. Importantly, the same classical model fit test statistics can 
be used, although a different reference distribution is needed. 

In other words, computing the approximate PP-PPMC predictive p-value requires 
only the minimum fit function chi-square value and a random number generator that can 
sample chi-square random variables. For example, given a model with p = 9 manifest 
variables and d = 24 degrees-of-freedom, if the freely available R programming language 
is used, the PP-PPMC predictive p-value associated with a minimum fit function chi- 


square of 50 can be obtained in as little as 3 lines of code: 


TE <= 503 p <— 95 do< 24; 
dist <- rchisq(10000,df=p* (pt1)/2+p) -rchisq(10000, df=d) 


print (mean(ifelse(dist > T, 1,0)),digit=3) 


Equations (20) and (21), when taken together, shed light on the relationship between 
PP-PPMC predictive p-values and classical p-values. It is interesting to note that the 
same minimum fit function value is used for obtaining both the predictive and classical 
p-values. Notice that the distribution of the difference i. — a has the same mean of 
px —d as the distribution of Gs jz but it has a larger variance 2(p, +d) than that of a 
ome , distribution, being 2(p.—d). This implies that the PP-PPMC method, as a result of 
accounting for parameter estimation uncertainty, tends to yield less stringent evaluations 


regarding model fit than the classical asymptotic theory based chi-square test. 
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5 Application to Empirical Data 

We use the well known Open-Book Closed-Book (OBCB) data set in Mardia, Kent, 
and Bibby (1979). This data set contains test scores from N = 88 examinees on p = 5 
subjects: Mechanics, Vectors, Algebra, Analysis and Statistics from 88 examinees. For 
this data set, it is well known that a two-factor correlated traits CFA model fits the 
data extraordinarily well (see e.g., Cai & Lee, 2009). The likelihood ratio chi-square test 
statistic is equal to 2.07 with 4 degrees-of-freedom and classical p-value of .72. 

We choose the test quantity to be the MWL discrepancy function in Equation (4) 
for the purpose of the assessment of overall model fit. This choice enables a direct 
comparison between the PP-PPMC method and the classical likelihood ratio chi-square 
test. Furthermore, the use of a chi-square type discrepancy function offers opportunities 
to evaluate the quality of approximation in Equation (20) in empirical data analysis. 

Under the PP-PPMC framework, we evaluate the global fit of four hypothesized mod- 
els including a two-factor CFA model (Hj), a congeneric test model (Hz), an essentially 
tau-equivalent model (H3), and a parallel test model (H4) to the OBCB data set. Notice 
that each of the four models is successively more restrictive than the preceding one. 

Specifically, we first obtain the maximum likelihood estimates of the model parame- 
ters and the associated asymptotic covariance matrices for each of the four hypothesized 
models. Then 0 and Y’? are simulated from the multivariate normal approxima- 
tion to joint posterior distribution, with p(6/Y°"’, H) replaced by Equation (16). Based 
on these simulated draws of @ and Y'’?, predictive and realized test quantities based on 
(4) and the associated PP-PPMC predictive p-values are obtained. Finally, the PP-PPMC 
predictive p-values are compared to Bayesian PPMC predictive p-values obtained from 


Mplus Version 7 (L. K. Muthén & Muthén, 1998-2012). 


Insert Figure 1 About Here 


Panel (a) in Figure 1 shows the scatterplot of the predictive versus realized test quan- 
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tities measuring overall discrepancy between the data and the two-factor CFA model 
(H;) for both the PP-PPMC and PPMC methods. The plus-signs represent the PPMC 
test quantities and the squares the PP-PPMC test quantities. It is clear that the distribu- 
tions are largely overlapping. The scattering of the predictive and realized discrepancies 
is evenly divided along the 45 degree line, yielding an estimated PP-PPMC predictive 
p-value of .556. The PPMC predictive p-value is equal to .577, indicating that the normal 
approximation in PP-PPMC resulted in a very similar estimate. The other panels (b), (c), 
and (d), provide the scatterplots comparing the predictive and realized discrepancies for 
models H2, H3, and Hy, respectively. 

In these figures, it is worth noting that the magnitude of realized discrepancies tend 
to increase, as successively more restrictive (and increasingly mis-specified) models are 
fitted to data. The PP-PPMC predictive p-values for the three models are .306, .245, 
and .000, respectively. Such a decrease in the PP-PPMC predictive p-values is entirely 


consistent with our expectations. 


Table 1: Model Fit Summary for the Open-Book Closed-Book Data 


p-values 
Model NTyywzr(Y,6) df Classical Bayesian PP-PPMC Approx. PP-PPMC 
Ay 2.073. 4 0.722 0.577 0.556 0.589 
A $978 5 0.110 0.283 0.306 0.308 
3 14.937. 9 0.093 = 0.253 0.245 0.216 
Ay 79.339 13 0.000 0.000 0.000 0.000 


Note. Tywt(Y°"s, 6) represents the minimum MWL fit-function value. 


Table 1 shows a summary of PP-PPMC, PPMC, and classical model fit results for the 
four models. Because the proposed PP-PPMC method employs standard maximum like- 
lihood estimation, the likelihood-ratio test statistics N Taw (Ys, 6) and the associated 
classical p-values are readily available. The last two columns show the PP-PPMC predic- 
tive p-values and the corresponding approximate PP-PPMC predictive p-values, found 


using the coupling approximation in Equation (20). From the Table we can conclude: 
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1) the PP-PPMC predictive p-values are close to the Bayesian PPMC values, 2) the PP- 
PPMC predictive p-values decrease as expected for more restrictive models, 3) the quality 
of coupling approximation is promising, and 4) the PP-PPMC predictive p-values tend 
to be less extreme than the corresponding classical p-values, again as expected. 
6 Poor-Person’s PPMC for Case Diagnostics 

The assessment of global model fit via PP-PPMC only tells us the overall extent to 
which the model is compatible with observed data. When the model fit is poor, more 
detailed inspections are required to discover the sources of misfit. In this section we 
illustrate the flexibility of the PP-PPMC method using the diagnosis of influential cases as 
an example. We wish to highlight the fact that other user-defined fit indices or altogether 
different aspects of model fit assessment can be studied with the same general approach. 

The influential cases differ markedly from other cases in the sample in the way they 
exert influence on model fit. In the context of SEM, the development of cases-influence 
diagnostics remains an open area of research. Some case-influence diagnostic methods 
are based on case-level residual estimates (Bollen & Arminger, 1991; Yuan & Zhong, 
2008; Yuan & Hayashi, 2010) and others are based on local influence analysis (Lee & 
Wang, 1996). With PP-PPMC, we show that the approach as outlined for global model 
fit assessment can be adapted to automate the generation of p-values and plots useful 
for case-influence diagnosis for any appropriately defined test quantity. 
6.1 Test Quantities for Case-influence Diagnosis 

Classification of Influential Cases In the context of regression modeling, influential 
cases can be classified into two categories. Specifically, cases with extreme predictor 
values are referred to as leverage points, while cases with large residuals (i.e. sitting far 
from the regression line) regardless of predictor values are referred to as outliers. When 
a leverage point has a large (or small) residual, the case is called a bad (good) leverage 
point (Rousseeuw & Leroy, 1987) in the sense that they lead to decrease (increase) in 


model fit. Notice that these definitions of outliers and leverage points depends on the 


A Poor Person’s PPMC 22 


model. An outlier in one regression model may turn out to be a good leverage point in 
another regression model. 

In applications of regression models it is well known that outliers and bad leverage 
points have disastrous effects on parameter estimation and model fit testing, whereas 
good leverage points lead to a more accurate regression coefficient estimates and im- 
provement in fit. Diagnostic measures for detecting outliers and leverage points are well 
developed and routinely available in the output of commercial software for regression 
analysis (Belsley, Kuh, & Welsch, 1980; Cook, 1986). 

In the context of factor analysis models, Yuan and Zhong (2008) showed that sim- 
ilar classifications of influential cases is possible when the factor analysis model is re- 
garded as a multivariate regression model with the factor scores playing the role of the 
predictors and the observed variables playing the role of responses. Outliers and lever- 
age points can then be defined using estimates of factor scores and residuals. In other 
words, an observation can be referred to as a leverage point when the associated factor 
score estimate is far from the center of the majority of the factor score estimates. And 
an observation can be called an outlier when the associated residual estimate is large, 
regardless of the value of factor score estimate. A bad leverage point will have large 
values both in factor score and residual estimates, whereas a good leverage point will 
have a small value in the residual estimate. As in regression, the definitions of outliers 
and leverage points are model-based, and thus the status of a case as an outlier or a 
leverage point could change under two different factor analysis models. The extensions 
to general structural equation modeling are examined in Yuan and Hayashi (2010). 

In this paper, we adopt the same classifications of the influential cases as defined and 
examined in Yuan and Zhong (2008) and Yuan and Hayashi (2010). The Bartlett formula 


is used for factor score estimation (see Yuan & Hayashi, 2010, p. 337 for some desirable 


A Poor Person’s PPMC 23 


properties of the Bartlett factor score estimates): 
-1 
a, = (a'¥A) Ay} (ys = ) (22) 


And test quantities are formulated so as to be sensitive to discrepancies between ob- 
served and replicated cases in their factor scores and residuals estimates. 


5 on model fit, it is natural to leave the 


To check the influence of the ith case ye 
observation in question out of the analysis and examine the change in model fit based 
on the remaining data points. This is a well-established method known as the leave- 
one-out method in cross-validation. In this paper, the sample with the i‘” case deleted is 
denoted as Yip. Let 8 (i) and Vii) denote the maximum-likelihood parameter estimates 
and the associated asymptotic covariance matrix, respectively, based on the leave-one-out 
sample Yip. 

The central idea of the case-influence diagnostic method based on PP-PPMC is to 
measure the divergence between factor score or residual properties associated with the 
obs 


i” observed case y’ 


and the potential replicated case y;’ . Note that the replicated 
case should be drawn from the posterior predictive distribution formed with the leave- 


one-out sample Yay. In other words, the parameter posterior is p(olYeiy, H) as opposed 
to p(0|Y°"S, H). In PP-PPMC, p(Olytiy, H ) is further approximated with a multivariate 


normal with mean vector 6(;) and covariance matrix V(j). 


Test Quantities Now let us introduce some notation useful for formulating and 
describing the test quantities to be employed for the case-influence diagnostics. Using 
the leave-one-out ML estimate bi), we can produce factor scores and residuals for the 
leave-one-out sample Yih. Let the centroid of the N — 1 factor scores for cases in Yay be 
denoted 7/;, and that of the residuals be &(;). Similarly, let the covariance matrix of the 


N — 1 factor scores be Q(;), and that of the residuals 37). 


Based on some parameter values in 6, we can compute factor score and residual 
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estimates for the ith case that was left out, yon . Let the factor score and residual estimates 


sobs 


°bS be denoted as 49° and 29", respectively. For the replicate observation voy let 


for y 

the factor score and residual estimates for y;” be denoted as #,” and 2;"", respectively. 
A natural choice of the leverage test quantity is the Mahalanobis distance of the factor 

score estimate of the ith (observed or replicate) case to the centroid of factor scores for 


the rest of the sample, 7,;,. Specifically, the realized test quantity is defined as 
ple, i). OP y, q y 
/ 
Theverage (yi, 0) = (4° >= ini) O% (a7 = iN) 2 (23) 


The predictive test quantity is 


| 
fo 
are 
3 
| 
SI 
—~ 
S 
NY 
~ 


rep -1/f(arep . 
Theverage (y; 8) (i) (a! - ili) . 
If the vast majority of the realized Mahalanobis distances are larger than the correspond- 
ing predictive Mahalanobis distances, then the predictive p-value for the ith case is close 
to 0 and thus, y?”> may be a potential leverage point. 

The test quantity for identifying outliers can be defined similarly. Specifically, the 


realized test quantity is defined as 
/ 
Toutier (¥#",0) = (27° — 2) By (e" —2@). (24) 


The predictive test quantity is 


Touttier (¥} 8) = (e? ~ zi) Zi) (e"” - 21) 


Again, if the vast majority of the realized Mahalanobis distances are larger than the cor- 


responding predictive Mahalanobis distances, then the predictive p-value for the ith case 


is close to 0 and thus, yer 


°’* may be a potential outlier. 
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6.2 Simulation Studies 

In this section, we investigate the performance of the PP-PPMC based case-influence 
diagnostics using simulated data. Two synthetic data sets of size equal to 206 are gener- 
ated from a CFA model with three correlated factors. Each of the three factors has three 
indicators. The model has 9 indicator variables and 23 degrees of freedom. Following 
the method developed by Yuan and Zhong (2008), 2 outliers, 2 good leverage points, 
and 2 bad leverage points are introduced, and the last 6 cases in the second data set are 
replaced by those 6 influential cases. No replacement is made for the first data set, and 
thus neither outliers nor leverage points are expected to exist. 

When the true generating model is fitted to each of the two data sets, the asymptotic 
chi-square test statistics were 23.75 and 37.50, yielding the associated classical p-values 
of 0.43 and 0.03, respectively. These results are expected due to the existence of outliers 
and bad leverage points in the second data set. It is anticipated that those six influential 
cases in the second data set and any such cases generated at chance levels in either of 
the two data sets can be flagged by the proposed PP-PPMC case-influence diagnostics. 
obs 


To determine whether or not y‘ 


°° is a leverage point (or outlier), we compute the 


(i) 


predr PY comparing the realized 


and predictive Mahalanobis distances of the factor score (or residual) estimate for the ith 


case-specific PP-PPMC predictive p-values, denoted by p 


case to the centroid of the factor score (or residual) estimates with the ith case removed. 

Panel (a) in Figure 2 presents the results for the first data set wherein no influen- 
tial cases were intentionally included. Each point in the plot represents a pair of the 
negative logarithm of the case-specific PP-PPMC predictive p-value for outliers against 
the negative logarithm of the corresponding case-specific PP-PPMC predictive p-values 
for leverage points. Due to the negative log transformation of the PP-PPMC predictive 
p-values, larger values on the horizontal (vertical) axis are more likely to be associated 
with leverage points (outliers). For ease of interpretation, dotted horizontal and vertical 


reference lines are added to the scatterplot at the value corresponding to the PP-PPMC 
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predictive p-value equal to 0.001. Thus, any observations located above the dotted hori- 
zontal (vertical) line can be flagged as potential outliers (leverage points). As expected, 
none of the cases were flagged as potential outliers or bad leverage points by the pro- 
posed method. Interestingly, case 174 is identified as a potential good leverage point, 


and when deleted, the model fit chi-square test statistic actually deteriorated slightly. 


Insert Figure 2 About Here 


Panel (b) in Figure 2 presents the same information for the second data set. As shown 
in the scatterplot, the proposed methods appear to have correctly identified most of the 
outliers and leverage points. For example, case 203 is identified as a potential good 
leverage point, and when deleted, the model fit chi-square statistic indeed deteriorated 
to 38.96 from 37.50. Case 205 is identified as a potential bad leverage point, and when 
deleted, the model fit chi-square statistic dropped significantly to 24.63 from 37.50. Other 
influential cases are also correctly identified as good leverage points, e.g. case 204, or 
outliers, e.g. case 202, or bad leverage points, e.g. case 206. Interestingly, case 201, added 
to this data set as an outlier, is not flagged as a potential outlier. Instead cases 6 and 159 
are flagged as potential outliers. 

These results lend initial support to the proposed PP-PPMC based case-influence 
diagnostics. Considering that there are more than one influential data points, and the 
proposed method is based on the leave-one-out method, and that standard maximum 
likelihood estimation may not be robust in the presence of multiple influential cases, the 
proposed PP-PPMC based method has exhibited surprisingly good performance. 

6.3. Open-book Closed-book Data 

In this section, we apply the proposed case-diagnostic method to the detection of 
influential cases in the analysis of the OBCB data. Although it is well known (and 
was seen in the previous section) that the two-factor CFA model fits well, it would be 


interesting to examine whether or not the existence of any outliers or bad leverage points 
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can influence the misfit of the model. Recall that the definitions of outliers and leverage 
points are model-based, implying that the results of the case-influence diagnostics can 
differ under different models. Thus, it would also be interesting to examine the case- 
influence diagnostics under alternative models such as a single-factor CFA model, and 


compare the results with those obtained from the two-factor CFA model. 


Insert Figure 3 About Here 


Figure 3 presents similar scatterplots as those shown in Figure 2. Panel (a) reveals 
that case 81 is a potential good leverage point under the two-factor CFA model, while 
Panel (b) reveals that the same case is a potential outlier under the single-factor CFA 
model. The response vector of the case 81 is (3,9,51,47,40), where the first two test 
scores are below the mean and the last three test scores are above the mean. This pattern 
of responses is strongly supportive of the two factor hypothesis (the first two tests are 
open-book) and is correctly identified as a good leverage point under the two-factor 
CFA model. For precisely the same reason, case 81 is correctly identified as an outlier 
under the single-factor CFA model. When case 81 is deleted, the overall model fit chi- 
square statistics actually increased slightly to 2.40 from 2.07 under the two-factor CFA 
model, while the chi-square statistic dropped significantly to 5.45 from 8.98 under the 
single-factor CFA model. 

7 Discussion 

Posterior predictive model checking has emerged as a flexible framework for both 
overall and targeted model-data fit assessment. When fully Bayesian methods are used 
to fit highly parameterized latent variable models, the output from posterior sampling 
(with MCMC) makes it straightforward to conduct PPMC. Recognizing the popularity 
of ML estimation in structural equation modeling, we propose a hybrid approach. 

We regard maximum likelihood as a form of large-sample Bayesian estimation pro- 
cedure and rely on posterior normality (the Bernstein-von-Mises phenomenon) to con- 


struct an approximate posterior predictive distribution using byproducts of maximum 
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likelihood estimation. In this Poor Person’s posterior predictive distribution, the exact 
parameter posterior is replaced by a multivariate normal distribution with mean vector 
equal to the ML estimate and covariance matrix equal to the inverse of the information 
matrix. 

We demonstrated the flexibility and computational efficiency of the PP-PPMC method 
with both overall model fit assessment and case-influence analysis as exemplary con- 
texts. We have also studied the relationship between classical p-values of model fit test 
statistics, PPMC predictive p-values, and the PP-PPMC predictive p-values. Using the 
Open-Book-Closed-Book data set, we provide an example of the similarity of predictive 
p-value estimates using the Bayesian PPMC and the proposed PP-PPMC methods. We 
establish a coupling relationship, and use it to demonstrate that overall model fit PP- 
PPMC predictive p-values can be approximated easily. It amounts to the adoption of a 
new reference distribution for standard model fit test statistics that respect the degree of 
uncertainty in estimating unknown model parameters. 

There are a number of important limitations to this research. First, the derivations are 
exclusively based on normal theory linear structural equation modeling. As observed 
variables depart from normality, the performance of the proposed PP-PPMC method 
remains unknown. While other distributions could be considered for the choice of the 
likelihood, such as a thicker-tailed t-distribution, we chose the normal distribution for 
analytic simplicity and to keep in line with standard procedures in structural equation 
modeling. Second, the posterior normality is a large-sample approximation that may or 
may not be appropriate for all model-data combinations. For smaller N, the posterior is 
less peaked and less ideally quadratic, again leading to unknown and potential perfor- 
mance issues. Comprehensive comparisons with alternative posterior approximations, 
e.g., fully Bayesian MCMC sampling or Rubin’s (1987) SIR algorithm, should be per- 
formed in future research and unfortunately remains out of scope for the current paper, 


which aims at introducing PP-PPMC. Finally, we have only considered overall fit and 
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case-influence. There are other aspects of model fit, e.g., residual dependence, that we 
have glossed over. We have not even discussed potentials of applying this idea to other 
forms of model fit indices, of which there are many in structural equation modeling. 
Nevertheless, we believe the PP-PPMC method to be a simple, non-iterative, and flexible 
alternative to both classical approaches as well as more modern fully Bayesian methods. 


We hope the initial evidence gathered here can prompt additional research. 
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Appendix: Theoretical Derivations of the Coupling Effect 
To show the result in Equation (20), we will use Tc15(Y, 0) presented in (10) as the test 
quantity because it is easier to manipulate. Recall from Equation (7), the large-sample 
distribution of s’? (vector of replicate sample first and second moments), conditional on 
6 and H is 
VN[s“? — o(8)||0,H © Ny, (0,0 [o(8)] 7"). (25) 


By Cochran’s theorem (Cochran, 1934), the quadratic form N[s’? — 7 (0)|'T[a(6)|[s”? — 
a (0)] is approximately chi-square distributed with p, degrees of freedom (see also The- 
orem 10.9 in Schott, 2005). To derive the desired result in Equation (20), we need the 


following two propositions. 


Proposition1 Given @ and H, the quadratic form N[s”? — 0 (6)|'T[a(8)|[s”’? — 7 (8)| 


can also be written as NTgrs(Y"?, 0)|0, H, and thus we have 
NTcxs(¥?, 6)|0,H ~ x5, (26) 
Due to asymptotic posterior normality of 8, we have approximately 
elves, 2 Ni; (6,0). 


Using continuous mapping and the multivariate delta method, an approximate posterior 


distribution of a (0) is 
o(8) |¥°", H & Np, (7 (8), 4(8) VA(6)') (27) 
where A(6) is the Jacobian matrix in Equation (3) evaluated at 6. It follows that 


(a (8) —s°*) — (o(8) — 8) | Y", H © Np, (0, A(6) VA(6)’), (28) 
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where ss is the vector of observed sample first and second moments. From Equation (9), 


the covariance matrix V is equal to [A(6)/NT(s°s)A(6)|~1. Consequently, the approxi- 


An 


mate distribution of the quadratic form N[(a(0) — s°°°) — (7 (6) — s°8)|/T (ss) |(@(0) — 
s°’s) — (7 (6) — s°5)], conditional on Y°"S and H, is a central chi-square with d degrees 


of freedom. Using this result and after some algebra, we find the difference between two 


An An 


quadratic forms N(o (0) — s°5)/T(ss) (a (0) — s°"°) — N(o (6) —s°"5)/T (ss) (o (6) — 5°") 
is distributed as central chi-square random variable with d degrees of freedom, condi- 
tional on YS and H. But notice that the difference in these two quadratic forms can be 


equivalently expressed as NTgrs5 (YS, 0) — Tos(Y"S, 6) | Y°"S, H. 


Proposition 2. Thus the following result holds: 
NTots(¥°", 8) — NTers(¥°", 8)|¥", H * x3. (29) 


Coupling Using the two propositions stated in (26) and (29), we can show the key 


result in Equation (20) as follows: 


Pr{Tcrs(¥"?, 0) > Ters(¥"*, 8)|Y", H} 

= Pr{NTois(¥"?, 0) > NTgrs(¥"*, 8) |Y"", H} 

= i UNTors(¥"?, 0) > NTers(Y, 8)} p(¥"? |8, H) p(@|Y", H) ded”? 
= i, UNTors(Y"?, 0) > NTors(¥"s, 8) }p(¥"? |8, H)dY"? p(o/Y°es, H)d0 
re [ Pr{x3. > NToxs(Y’, 6) |8, H} p(6|¥°, H)d0 


= [Pr {ape > NTexs(¥, 8) + NTois(¥", 6) — NTgis(¥", 8) 
6 


0H} p(0|Y°"s, H)de 


x Pr {xe = ne > NTcrs(Y°%s, 6)} . (30) 
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(a) Hy: Two-Factor CFA Model, PP-PPMC P preq = .556, 
PPMC Popred = .577 
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(c) H3: Essentially tau-equivalent Model, PP-PPMC 
P pred = .245, PPMC Ppred = 253 
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(b) Hz: Congeneric Test Model, PP-PPMC Ppreq = .306, 
PPMC Ppred = -283 
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(d) Hg: Parallel Test Model, PP-PPMC P preq = .000, 
PPMC Ppreq = -000 


Figure 1: Scatterplots of PP-PPMC and PPMC predictive discrepancies against realized 
discrepancies for the Open-book Closed-book data set and the four fitted models H1, Ho, 


Hs, and Ag. 
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Figure 2: Scatterplots of the Negative Logarithm of the Case-specific PP-PPMC Predic- 
tive p-values for Leverage Points Diagnostics versus the Corresponding Negative Loga- 
rithm of the Case-specific PP-PPMC Predictive p-values for Outliers Diagnosis 


Outliers 


Figure 3: Scatterplots of the Negative Logarithm of the Case-specific PP-PPMC Predic- 
tive p-values for Leverage Points Diagnostics versus the Corresponding Negative Loga- 
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Leverage Points 


(a) Two-Factor CFA Model 


Leverage Points 


(b) Single-Factor CFA Model 


rithm of the Case-specific PP-PPMC Predictive p-values for Outliers Diagnosis 
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